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ABSTRACT 


Learning curves are an important tool in cognitive diagnos- 
tics modeling to help assess how well students acquire new 
skills, and to refine and improve knowledge component mod- 
els. Learning curves are typically obtained from a model 
estimated on real data obtained from a finite, and usually 
limited, sample of students. As a consequence, there is some 
uncertainty associated with estimating the model from that 
sample, and a risk that the inferences made using learning 
curves derived from the estimated model are over-confident 
one way or another. Based on previous work modeling the 
uncertainty on Additive Factors Model parameters, we de- 
rive a principled way to quantify the confidence in learning 
curves associated with each knowledge component. We show 
that our approach leads to relatively tight bounds on the 
learning curves, much tighter than a naive approach relying 
only on parameter uncertainty. This also reveals a disparity 
across knowledge components regarding how confident one 
can be in how well these skills are mastered. 
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1. INTRODUCTION 


Learning curves are a crucial tool for cognitive diagnostics 
modeling. They help build relevant competency frameworks 
to accurately measure learners skills and to give them mean- 
ingful guidance and feedback in intelligent tutoring systems 
(ITSs). More precisely, learning curves measure the rate at 
which students, or simulated artefacts [22], acquire compe- 
tencies. This allows to evaluate the suitability of a com- 
petency framework (aka Q-matriz) and a principled com- 
parison of different learning systems. Learning curves are 
“graphs that plots performance on a task versus the number 
of opportunities to practice” [17]. In the educational field, 
learning curves usually take as learning performance metric 
the error rate (or equivalently success rate) when applying 


Cyril Goutte and Guillaume Durand "Confident Learning Curves 
in Additive Factor Modeling" In: Proceedings of The 13th 
International Conference on Educational Data Mining (EDM 
2020), Anna N._ Rafferty, Jacob Whitehill, Violetta 
Cavalli-Sforza, and Cristobal Romero (eds.) 2020, pp. 424 - 430 


Guillaume Durand 
National Research Council Canada 
Guillaume.Durand@nrc-cnrc.gc.ca 


an individual skill or a set of skills. They were empirically 
found to follow a “power law of practice” [18], which means 
that the error rate over time decreases roughly linearly with 
the logarithm of the number of practice trials taken (aka op- 
portunities). Comparing ITSs or sections of ITS can be done 
by considering the steepness of the curve: A steeper curve 
indicates a faster acquisition of the skills practiced [17]. 


However, tracking the performance of skills learned in a mul- 
tidimensional learning environment can be difficult, as those 
environments combine different set of skills evaluated to- 
gether. In such situations, some cognitive diagnostic models 
can be useful to compare learning systems but also to under- 
stand the learning mechanisms at play [10]. The Additive 
Factors Model (AFM) [1], a well known cognitive diagnos- 
tics model, does this by assuming that each necessary skill 
in an item comes with a skill-specific additive contribution 
towards the probability of success on the item. Fitted AFM 
parameters can also be used to draw learning curves that 
compensate for the attrition bias [9]: Over time, fewer learn- 
ers tend to practice some items because many of them have 
learned the skill, and the curves tend to quickly degenerate, 
impacting the estimates of the slopes and the diagnostics of 
how much learning has occurred. In addition, when learning 
curves are drawn directly from AFM parameters, the valid- 
ity of the inferences that can be made will depend greatly on 
the reliability of the parameters values, and ultimately on 
the quality of the fitted data. More precisely, fitted parame- 
ter values tend to compensate for noise, missing values (e.g. 
due to attrition) or mis-specified competency models. Rupp 
and Templin [21] showed for instance how the fitted values 
of model parameters in DINA [11] would inflate when fit- 
ted with purposely erroneous Q-matrices. We can expect a 
similar impact with any model using Q-matrices, including 
AFM, a situation made worse by the fact that, in reality, 
perfect Q-matrices are difficult to identify [5], even when 
they are retro-engineered from performance data [19]. This 
motivates the necessity to estimate not only parameter val- 
ues, but also the statistical confidence on those values, and 
take into account this uncertainty in any model interpre- 
tation, whether based on those values or on the associated 
learning curves. 


Previous work investigated the estimation of standard er- 
rors on DINA [20] or AFM [7] parameters, and showed how 
it could impact learning curves shape and utimately AFM 
interpretability and usefulness [15]. Assuming independence 
across parameters, they produced bounds on learning curves 
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using standard confidence intervals on parameter values. 
However, in practice, the AFM skills parameters (Section 2) 
are clearly not independent. In this contribution, we show 
how we can take into account the structure of the covariance 
between the AFM parameters in order to better model and 
control the uncertainty on those parameters. We describe 
a technique for generating confidence intervals on the learn- 
ing curves using a sampling approach. We illustrate how 
this works on several competency models from a well-known 
dataset obtained from a geometry tutoring course, and we 
show how it allows us to compare how different competency 
models may model the same skills with different confidence. 


In the following Section, we quickly describe the AFM model 
and introduce our method for obtaining more adequate es- 
timates of the confidence intervals on the learning curves. 
Section 3 quickly describes the well known EDM dataset 
that we experiment with in Section 4. Section 5 discusses 
the results and their impact before we conclude. 


2. METHOD 

The Additive Factors Model (AFM) introduced by Cen et al. 
[1, 3] is used in the PSLC-Datashop [12] in order to evaluate 
domain models. It models the probability of success of a 
student 7 on item j using user and skill specific parameters: 


K K 
P(Yij = 1a, 8,7) = 0 (« + 2 Beayn + Seats] 


k=1 h=1 
(1) 
with o(x) = 1/(1 +e”) the logistic function, and 


a; is the proficiency of student 7, 

By is the easiness of skillk =1...K, 

yr is the learning rate for skill k, 

Q = [qx] is the J x K Q-matrix, representing the cognitive 
model mapping items to skills, 

tix is the number of times student 7 has practiced skill k 
(on any item). 


Parameters 0 = (a, G,7) are estimated by maximizing the 
(penalized) likelihood of the model over observed student 
outcomes (see e.g. [6]). One attractive feature of AFM 
is that it easily provides performance curves showing how 
students acquire skills. Among the different types of learning 
curves that can be derived from AFM [9, 8], we focus on the 
data- and student-independent idealized learning curve [8], 
that simply traces the probability of error for an idealized 
student with a = 0 proficiency, on an item with a single skill 
k: 


LC (t) =] 


P(Y = 1lla=0,8,7) =o (Be +t). (2) 


Learning curves are typically computed with the maximum 
penalized likelihood parameters 6 = (4.8.7). As noted 
for example by Philipp et al. [20] and derived for AFM by 
Durand et al. [7], one can also estimate the uncertainty on 

a, B, 7), in the form of standard errors. This is relatively 


straightforward as the covariance matrix on parameter esti- 
mates is asymptotically equal to the inverse of the informa- 


tion matrix, Cov () a Tz 1 The information matrix Tg can 


‘aka Individual Learning Curve in [9]. 


Algorithm 1: Error bars on learning curve for skill k. 


Data: Parameters 0, covariance Cov (0) 


Parameters: Target skill k, simulation sample size N 

Result: Error bars for the learning curve for skill k, at 
a set of opportunities {t =1...T} 

repeat 


Sample 6 ~ NV (6, Cov (9)); 
Compute learning curve LC, (t) for target skill k 


until N simulations; 
For each opportunity t, compute confidence interval 


[€x(t), ux(t)] using relevant quantiles” of {toe (o}. 


be estimated from first or second order derivatives of the cost 
function [20, eq. 3, 4]. This also provides a key to quanti- 
fying the uncertainty on the learning curves. Using the fact 
that parameters are (asymptotically) normally distributed 


around @ with the known covariance matrix Cov (6) [7], we 


can sample sets of parameters from that multivariate Gaus- 
sian distribution, compute the learning curve for each set 
of parameters, then empirically estimate the error bars on 
the learning curve through the relevant quantile statistics, 
as outlined in Algorithm 1. 


Although Algorithm 1 focuses on producing error bars on 
the learning curves, we can also use the simulated sample 
to evaluate the stability of the entire learning curve, using 
for example the average standard deviation across opportu- 
nities: 


T 
1 

ee (4) 

ok = 2 st.dev.{LCx*’ (t)} 


Lower 0» indicate that the sampled learning curves are closer 
together, thus the learning curve is more stable. 


3. DATA 

For our experiments, we used the “Geometry Area (1996- 
97)”, a public dataset from DataShop [12]. This dataset 
contains 6778 observations of the performance obtained by 
59 students completing 139 unique items from the “area unit” 
of the Geometry Cognitive Tutor course (school year 1996- 
1997). This dataset has been extensively used [1, 2, 7, 13, 
14]. We selected three knowledge component (KC) models: 


e hLFASearchAICWholeModel3arithO (referred to sim- 
ply as arith below), 


e hLFASearchModell-context (context below), 


e Original (orig below). 


These KC models were selected for their reasonable numbers 
of skills and observations but also because they have distinc- 
tive goodness of fit metrics, suggesting that they are high- 
performing KC models. Table 1 shows that the best pre- 
dictive model would be arith. The number of skills (KCs) 
seems to have limited impact on the goodness of fit metrics. 


?For example, the 95% confidence interval is obtained as 
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Figure 1: Left: Sampled 8 and y for KC#11 of the context model. Right: Corresponding learning curves (in 
light gray); the LC given by the AFM model is in red, with 95% confidence intervals at opportunities up to 
10 shown as red vertical bars. The 95% CI from [7] is indicated in black crosses for comparison. 


Table 1: Characteristics and predictive quality of 
the KC models, as computed by PSLC-Datashop. 


Name KCs_ Stud. | #Obs. | AIC BIC RMSE 
arith 18 59 5104 | 4948 5569 397 
context 12 59 5104 | 5030 5573 399 
orig 15 59 5104 | 5180 5762 A0T 


Another motivation for choosing these KC models is their 
skills sharing as some skills have an identical mapping to 
items in another model, allowing to compare the stability of 
the same skill accross KC models. 


4. EXPERIMENTS 


In this section, we first illustrate how we derive error bars 
on the learning curve for a specific KC, then show results for 
an entire KC model, and finally we compare the stability of 
learning curves for equivalent skills in different KC models. 


4.1 Illustration 

We focus on KC#11 (equi-tri-height-from-base/side) 
from KC-model context. This is a relatively hard (6 = 
—2.97) skill, but with quick learning (y = 1.23). Figure 1 
(left) shows the values of (11 and 711 that were sampled by 
Algorithm 1 for this KC. As seen in the plot, the marginal 
uncertainty on 311 and 411 is quite high (from -4.5 to -1.5 for 
811), but they are also very correlated: samples with higher 
easiness have lower learning rate. 


Each of the points in Fig. 1 (left) is translated into a corre- 
sponding learning curve (Eq. 2) in dotted light gray in Fig. 
1 (right). Due to the correlation noted before, we can see 
that the sampled learning curves are actually fairly stable, 
compared to what extremes of the distributions of 61; and 
‘11 would suggest (see dashed lines with crosses in Fig. 1, 


[g2.5, 997.5], where ge is such that €% of the sample is below 
ge and (100 — €)% is above. 


which replicates Fig. 4 from [7]). The red curve in Fig- 
ure 1 (right) is the learning curve computed from the AFM 
solution, with 95% confidence intervals obtained from the 
sample at each opportunity indicated as red bars. We see 
that although there is some uncertainty around the steep 
part of the curve, the learning curve is well-controlled and 
easy to diagnose, indicating that the skill is completely ac- 
quired after around 5 opportunities. 


4.2 Application to KC models 

We now show how we can generate learning curves with con- 
fidence intervals for a full KC model. The process illustrated 
above is applied to each KC, producing one learning curve 
with confidence bounds. For improved readability, we show 
the results on KC-model context, which has the smallest 
number of KCs among our three models. 


Figure 2 shows the learning curves for the twelve knowledge 
components. We can see that most learning curves are well- 
controlled. The average standard deviation o, depending on 
the skill, ranges from 2% to 8%. ”Flat” KCs tend to have 
lower uncertainty, which is understandable: when the error 
rate for a skill is low and flat, this is easy for the model to 
pick up with confidence by predicting high success (high ) 
for that skill. 


4.3. Comparison of KC models 
By better estimating and controlling the uncertainty in learn- 
ing curves, we can more reliably compare how skills are ac- 
quired according to different KC models. 


In Figure 3 we show the same skill, compose-by-multiplication, 
as modeled by the 12-skill model context, and by the 15- 
skill model orig. The shapes of the learning curves are very 
similar, which is not surprising as both KCs are associated 
to the same items, and estimated from the same student out- 
comes. Despite differences due to the influence of other KCs 
in the models, the resulting values of 6 and y are similar. 
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Figure 3: KC compose-by-multiplication from KC models context (left) and orig (right). 


uncertainty across opportunities (lower is better). 


The error bars, however, show that the confidence is slightly 
better in the orig model, showing an average dispersion of 
around 3.5% error across the learning curve (versus 4.3% in 
context). This shows that even in a model with more KCs, 
learning curves can be modelled with higher confidence. 
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Opportunity 


o is the average 


Our second example, in Figure 4, compares similar skills, 
compose-subtract from arith, and Subtract from orig. Again, 
the general shape of the learning curves are similar, due to 
similar values for the estimated 8 and y in each model.? 


The sampled learning curves also seem qui 


°For arith, 8 = 588+ .524 and y = .3294 


te similar, sug- 


t .200, while for 
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Figure 4: KC compose-subtract from model arith (left) and KC Subtract from orig (right). o is the average 


uncertainty across opportunities (lower is better). 
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Figure 5: Structure of the correlation between § (y- 
axis) and y (x-axis) for all KCs in model context. 


esting that both KC models represent that skill with similar 
levels of confidence. This is confirmed by the value of the 
average dispersion, which is 5.4% for one model and 5.1% 
for the other. We see again that the different number of 
KCs has limited impact on how confident the models are on 
a particular skill. 


5. DISCUSSION 


Figure 1 (left) showed that there is a strong correlation be- 
tween the sampled values of $1; and 711. The impact of this 
correlation on the actual learning curve is that, according to 
the model, this knowledge component can be modeled by a 
higher easiness (starting with lower error) and lower learn- 
ing rate (flatter curve), or by a lower easiness and higher 
learning rate (i.e. starting higher but dropping faster). This 
finding actually generalizes to the entire KC model, as shown 
by the correlation matrix in Figure 5. We see that there is a 
consistently strong negative correlation between the 8 and y 
parameters for each knowledge component, due to this com- 
pensatory mechanism. There are also some correlations be- 
tween parameters of different KC, which may suggest some 
compensatory effects in the AFM model. 


context, 6 = .576 + .523 and y = .336 + .200. 
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One straightforward outcome of this work is that the pro- 
posed method provides a much better estimate of the confi- 
dence in a learning curve than the method proposed in [7], 
which relied on the marginal distribution of AFM parame- 
ters 3 and y and used the boundaries of straight confidence 
intervals on each parameter independently. We included 
their 95% confidence interval as black crosses in Figure 1: 
that suggests that the uncertainty on the learning curve is 
high up to 8 or more opportunities. By contrast, our ap- 
proach shows that the actual uncertainty is much better 
controlled, and that the skill is essentially learned by op- 
portunity 5 or 6. 


In this paper, we have worked with the basic learning curve 
called the individual learning curve in [9] or the idealized 
learning curve in [8]. We note that this work can be applied 
to any learning curve that relies on the parameters of the 
AFM model. This includes in particular the completed learn- 
ing curve [9], where empirical observations of success /failure 
are completed by model estimates. 


In previous work, Harpstead and Aleven [10] used empirical 
learning curve analysis to inform educational game design. 
They derive empirical curves and AFM-fitted curves, with 
standard errors on the curves, using a completely different 
approach from ours. Contrary to the approach advocated 
here, which relies on the core uncertainty on model param- 
eters resulting from a maximum (penalized) likelihood es- 
timation, their learning curves and error bars are obtained 
using non-parametric smoothing (LOESS [4], presumably 
from the stat-smooth function of the ggplot2 R package). 
On the empirical measurements of success, this produces 
learning curves that are based on observations alone, and 
therefore may not have the desirable properties enforced by 
the AFM model, such as monotonicity (decreasing learning 
curves). On the fitted AFM predictions, those properties 
are enforced and apparent from the learning curves. Two 
key differences with our approach, however, are: 


1. The use of fitted AFM values to produce error rate 


“Blue curves in [10], Figs 3, 4 and 7. 
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predictions does not take into account the uncertainty 
in parameter values due to estimation from a finite 
sample, and 


2. The width of the error bars are directly impacted by 
the number of students at each opportunity, typically 
resulting in widening errror bars as attrition kicks in. 
By contrast our sampling-based algorithm often yields 
narrowing error bars as opportunities increase and the 
error rates near zero (for all sampled parameters). 


A more systematic study of differences between our ap- 
proach and the non-parametric smoothing of model esti- 
mates would require further study. The opportunity of com- 
bining both approaches in order to take into account the 
uncertainty due to parameter estimation and sampling un- 
certainty across the finite set of students seems particularly 
promising. 


6. CONCLUSION 


In this contribution, we provided a principled way to esti- 
mate and control the confidence in learning curves derived 
from the Additive Factors Model. Error bars on the learn- 
ing curves account for the statistical uncertainty associated 
with estimating the AFM model from a finite set of stu- 
dents. They allow to more accurately and more confidently 
interpret how skills are acquired by students. We showed 
how this allows to characterize learning for all skills of a 
KC model of a geometry tutoring course. We also showed 
how modeling the confidence of learning curves can help 
compare how two different KC models represent the same 
skill. Our approach was illustrated here on one type of learn- 
ing curve, but it can be applied to any alternative learning 
curve, as long as it can be computed from the usual AFM 
parameters. In addition, the same idea can be applied in 
a straightforward way to any cognitive diagnostic model for 
which a covariance on parameters can be computed. This 
includes in particular, models estimated by penalized maxi- 
mum likelihood. For instance, the Individualized-slope Ad- 
ditive Factors Model (iAFM) [16], that extends AFM with a 
student learning rate, could be an excellent candidate to our 
method, especially as authors noticed that iAFM ”[student] 
learning rate is significantly related to estimates of student 
ability”. Finally, our hope is that this work will help spread 
the use of learning curves with well-controlled confidence 
among practitioners of AFM. 
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